(a) (b) (c)
a) The ROC curve of the random forest model constructed for the breast cancer
AUC was 0.99. (b) The ROC curve using the randomForest package for the
ata without any encoding. The AUC was 0.878. (c) The ROC curve using the
kage for the factor Xa protease cleavage data without any encoding. The AUC
50. A tree developed using the party package for the breast cancer data.
dealing with a protease cleavage pattern discovery problem, in
eptides are non-numerical, there are two ways to construct a
forest model. Using raw peptides as input is one way, which is
bove and in Figure 3.49(b) and Figure 3.49(c). In such a random
odel, the residue variables can be ranked in addition to a
ation model.
ver, if it is required to discover which peptides are the most
nt ones for discriminating between cleaved and non-cleaved
the random forest algorithm can be used to rank cleaved peptides